The knee-jerk mapping 

Peter G. Doyle Jim Reeds 

Version dated 5 October 1998 
GNU FDL* 



Abstract 

We claim to give the definitive theory of what we call the 'knee- 
jerk mapping', which is the basis for a class of optimization algorithms 
introduced by Baum, and promoted by Dempster, Laird, and Rubin 
under the name 'EM algorithm'. 

Introduction 

We give the definitive theory of the knee-jerk mapping, to be defined below. 
This mapping has been investigated by many people, most notably Baum 
([2], [3], [5], [4], [1] ). 

We begin with an example, taken from [6]. Suppose you want to locate 
the maximum of the function 

Z(x,2/) = xV'(l + 2x)i25 

on the l-simplex (a fancy name for a line segment) 

i: = {x,y >0; x + y = 1}. 

* Copyright (C) 1990, 1998 Peter G. Doyle. Permission is granted to copy, distribute 
and/or modify this document under the terms of the GNU Free Documentation License, 
as published by the Free Software Foundation; with no Invariant Sections, no Front-Cover 
Texts, and no Back-Cover Texts. 
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One way you can find it is by iterating the knee-jerk mapping 



{x, y) ^ {xZ^, yZy) = . . . . 

This maps the simplex E to itself, and what is notable about the mapping is 
that it increases the value of the objective function Z. 

The one true explanation of this ratcheting property of the knee-jerk map, 
the explanation that lays hare once and for all what is going on here, is as 
follows: Like any polynomial with only positive coefficients, the function Z 
is log-log-convex; that is, logZ is convex as a function of (log log ^/) ; that 
is, 

W{u,v) ^ log Zie'^.e") 

is convex as a function of {u, v) . We're trying to find the maximum of W on 
the set 

T = {e" + e'' = 1}. 

Since W is convex, if we fix a point {u,v), the graph of W lies above its 
tangent plane at {u,v,W{u,v)): 

W{u, v) > Wu{u, v){u — u) + Wv{u, v){v — v). 

Now ideally we'd like to move from (w, v) directly to the point of T where 
W{u, v) is greatest. What the knee-jerk mapping does is move instead to the 
point where the lower bound on the right hand side of the inequality above 
is maximized. This can't help increasing the objective function, right? 

One remarkable fact should be pointed out, though it won't be gone into 
below: While the function Z is log- log-convex, it is nevertheless log- concave; 
that is, logZ is concave as a function of {x, y). (This is true because Z is a 
product of homogeneous linear functions with positive coefficients.) Because 
Z is log-concave, it has a unique maximum on the simplex S. While all 
polynomials with positive coefficients are log-log-convex, only very special 
polynomials are simultaneously log-concave. 

A class of log-concave examples fundamentally more exciting than prod- 
ucts of linear functions can be obtained as follows: Take a connected graph 
G, think of its edges as variables, form for each spanning tree of G a mono- 
mial (of degree one smaller than the number of vertices of G), and form a 
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polynomial Dq — the discriminant of G — by adding up the monomials cor- 
responding to all spanning trees of G. For example, if G is a triangle with 
edges X, y, z, 

Da{x, y, z) ^ xy + xz + yz. 

Discriminants of graphs are always log-concave. (If you know what a matroid 
is, let me add that the discriminant of a regular matroid is log-concave, but 
I don't know if the discriminant of a general matroid always is; my guess is 
that it isn't.) 

Discriminants of graphs are particular cases of the diagonal discriminants 
of Bott and Duffin; these are always log-concave (because the determinant 
function is log-concave when restricted to the set of positive-definite matrices) 
as well as being log-log-convex (because they are polynomials with positive 
coefficients) . 

Knee-jerk functions 

In real n-space, we will denote the positive orthant by 11 and the closed 
standard simplex by E: 

n = {xi, . . . ,a;„ > 0}, 

E = {.Tl, . . . , ,T„ > 0; Xi + . . . + Xn = 1}. 

We denote their closures by 11 (the non-negative orthant) and E (the closed 
standard simplex). 

We say that a function Z{xi, . . . , Xn) from 11 to the positive real numbers 
is log-log- convex if log Z is a convex function of Ui = logxi, = logx„. 

The name comes from the fact that in the case n = 1 a, log-log-convcx function 
is one whose graph appears convex when drawn on log-log graph paper. We 
say that Z is a knee-jerk function if Z is increasing (which we take to mean 
what some would call 'non-decreasing') and log-log-convex. For pedantry's 
sake we require in addition that Z be smooth, and extend continuously to II. 

Properties and examples. 

There are many characterizations of convex functions, but for our purposes 
the most important is that a function is convex if and only if its graph lies 



3 



above all of its tangent planes. Thus a smooth function Z is log-log-convex 
if and only if for any two points x = (xi, . . . , x^) and x = (a;i, . . . , x^), 

log Z - log Z > (log {ux-u^) + ...^ (log {un - Un) 
= ^^log— + --- + ^^log— , 

where Z — Z(x) and (logZ)^^ denotes the derivative of logZ with respect 
to Ui, etc. 

Using this characterization of log-log-convexity and Jensen's inequality — 
which states that for a concave function like log the weighted average of the 
values is httler than the value of the weighted average — we get a proof that 
the function Z{xi, . . . , x„) = Xi + . . . + Xn is log- log-convex, and hence a 
knee-jerk function: 

—— log — + ... + -—— log — 

Zj X\ Zj Xyi 

log h . . . H log — 



Xi Xi 



< log( 
= log 



+ ...H 

Xi ~\~ . . . ~\~ Xji Xi Xi ~\~ . . . ~\~ Xfi •^xi 

Xi + . . .+Xn 



Xi + ... + Xn 

- log-. 

Once we know that + . . . + x„ is a knee-jerk function, we can easily 
produce a wealth of other examples by observing that the class of knee- 
jerk functions is closed under a variety of operations. The coordinate func- 
tions Xi, . . . ,Xn are knee-jerk functions, as is any positive constant function. 
Products, positive scalar multiples, and positive (possibly fractional) pow- 
ers of kncc-jcrk functions are knee-jerk functions. So is the composition 
Z{Zi, . . . , Zk) of a knee-jerk function Z{xi, . . . , Xk) with knee-jerk functions 
Zi{xi, . . . , Xn), . . . , Zk{xi, . . . , Xn), because the composition of increasing con- 
vex functions is increasing and convex. And since xi + . . . + Xn is a. knee-jerk 
function, it follows that sums of knee-jerk functions are knee-jerk functions. 
Thus any non-zero polynomial with non-negative coefficients is a knee-jerk 
function. 
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The knee-jerk mapping 



If Z{xi, . . . , Xn) is a knee-jerk function, we define the knee-jerk mapping 

7z(x) = ■ — —{xiZx^, . . ., XnZx„) 

^l^Tl + • • • -|- Xn^Xn 

z 

{xi{log Z)^^, . . . ,x„(log 



^iZxi + . . . + XjiZ^^ 

= (logZ).. + .'. + (lcgZ).„'<'°^^>--<'°^^)"»)- 

(If Z^^^ ...^ Z^^^ 0, we define Tz(xi, ...,Xn) = ^^^r:^{xi, Xn)—OT: 
just pretend we didn't notice.) Note that when Z is homogeneous of (possibly 
fractional) degree d, Euler's identity 

XlZx^ + . . . XnZx„ — dZ 

implies that 

2z(x) = —{xiZx^, . . . , XnZx^). 

Tz maps the positive orthant 11 to the closed simplex E, and thus restricts 
to a mapping of S to S. It is easy to see that a point x G S is fixed by Tz if 
and only if it is a critical point of Z on S. The great thing about the knee- 
jerk mapping is that if x is not a critical point of Z on S then Z{Tz{x.)) > Z; 
this will be proven in the next section. This makes the knee-jerk mapping a 
natural to iterate if you are interested in finding the maximum of Z on E. 
The name 'knee-jerk' is partly meant to suggest the automatic way in which 
the mapping increases the objective function Z. 

The knee-jerk inequahty 

Write 

x' = rz(x) 

and 

Z' = Z(x'). 
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The knee-jerk inequality. 

log — > log — + ... + x„ log — . 

Zj Zj \ a^i "^n ) 

Proof. Prom the characterization of log-log-convexity above, we have 

bry 1 /-7 \ ^iZxi , Xl XnZx , Xfi 

g Z - log Z > -—- log — + ... + -—- log — . 

Z Xl Zj Xji 

Substituting x = x' yields the knee-jerk inequality. ^ 

Recall (if you don't already know) that for probability vectors x e E, y e 
E the I-divergence /(y; x) is defined to be 

^(y; x) = yi log — + ... + y„ log — . 

Xl Xyi 

This quantity is always > 0, with equality if and only if x = y. (This follows 
from an application of Jensen's inequality similar to that used above to show 
that Xl + . . . -|- is a knee-jerk function.) 

Corollary. If x e E then 

log — > + + ^^ > 0. 

^ Z ~ Z \ ^ J - 

In particular, Z' > Z unless the point x is fixed by Tz, which happens if and 
only if x is a critical point of Z on 12. ^ 



What is going on here? 

Say our goal is to maximize Z over E. We're sitting at some point x, and we 
want to pick a new point x e S so as to increase the objective function Z as 
much as possible. Since Z is log- log- convex we know that 

log Z -l0gZ> (log Z)^^{ui - Ui) + . . . + (log Z)u„ {Un - Un) ■ 

The knee-jerk idea is to choose x e E so as to make the lower bound on the 
right of this inequality as large as possible. That is, we want to do as well as 
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possible using only the value of Z and its derivatives at x and the knowledge 
that Z is a knee-jerk function. So we want to choose x so as to maximize 

. . . ,it„) = (logZ)„j(ui - Ml) + ... + (logZ)„„(u„ - 

subject to the constraint 

G{ux,...,Un) = + ... + e"" = 1. 

The maximum occurs where 

is proportional to 

VaF=((logZ)„,,...,(logZ)„J, 

that is, where 

x = rz(x). 

Ruminations. When x e E, the fact that x' = 2z(x) maximizes the 
lower bound for Z implies right away that Z' > Z, independently of the 
hocus-pocus with the I-divergence. Indeed, the positivity of the I-divergence 
can now be seen as a consequence of the fact that xi + . . . + Xn is a. knee-jerk 
function. This is not so surprising, perhaps, since both facts followed from 
very similar applications of Jensen's inequality. But now it appears that 
Xi -|- . . . + x„ is somehow the most important of all knee-jerk functions. And 
why should it be so distinguished? Because it crops up in the definition of 
the simplex S. 

Generalizations 

Given a = (ai, . . . , a„), ai, . . . , a„ > 0, define 

= {xi,---,Xn>0; aiXi + ... anXn = 1} 

and define 

Tz,sl '■ n — > Ea, 
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X — J-Z,a{^) — ^ — — , ■ ■ ■ , )■ 

Then the knee-jerk inequahty becomes 

log — > aiXi log h . . . + a„a;„ log — . 

When X e E this becomes 

bZ X\Z^^ + . . . + XjiZj. I I \ I \\ \ rv 

g — > I{[axx^, . . . ,anxj\[axxx, . . . ,anXn)) > 0. 

More interesting, we can replace the simplex E with a product of sim- 
phces: Let 

Z — Z{x\^\, ■ ■ ■ 1 2^1,ni) • • • ) •^k,li ■ ■ ■ 1 ^k,nk)- 



Let 

and define 

by 

Then 



Tv : n ^ T 



Xa o — 



"^^,3 ^^i 



and when x e T, 

log § > E ^^^f^AKi, ■ ■ ■ , (^.1, Hn.)) > 0. 
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